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Abstract 

The  emergence  of  low-cost  sensor  architectures  for  diverse  modalities  has  made  it  possible 
to  deploy  sensor  arrays  that  capture  a  single  event  from  a  large  number  of  vantage  points  and 
using  multiple  modalities.  In  many  scenarios,  these  sensors  acquire  very  high-dimensional  data 
such  as  audio  signals,  images,  and  video.  To  cope  with  such  high-dimensional  data,  we  typi¬ 
cally  rely  on  low-dimensional  models.  Manifold  models  provide  a  particularly  powerful  model 
that  captures  the  structure  of  high-dimensional  data  when  it  is  governed  by  a  low-dimensional 
set  of  parameters.  However,  these  models  do  not  typically  take  into  account  dependencies 
among  multiple  sensors.  We  thus  propose  a  new  joint  manifold  framework  for  data  ensembles 
that  exploits  such  dependencies.  We  show  that  simple  algorithms  can  exploit  the  joint  manifold 
structure  to  improve  their  performance  on  standard  signal  processing  applications.  Addition¬ 
ally,  recent  results  concerning  dimensionality  reduction  for  manifolds  enable  us  to  formulate 
a  network-scalable  data  compression  scheme  that  uses  random  projections  of  the  sensed  data. 
This  scheme  efficiently  fuses  the  data  from  all  sensors  through  the  addition  of  such  projections, 
regardless  of  the  data  modalities  and  dimensions. 


1  Introduction 

The  geometric  notion  of  a  low-dimensional  manifold  is  a  common,  yet  powerful,  tool  for  modeling 
high-dimensional  data.  Manifold  models  arise  in  cases  where  (i)  a  K -dimensional  parameter  9  can 
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be  identified  that  carries  the  relevant  information  about  a  signal  and  (ii)  the  signal  xg  E  R;V  changes 
as  a  continuous  (typically  nonlinear)  function  of  these  parameters.  Some  typical  examples  include 
a  one-dimensional  (1-D)  signal  shifted  by  an  unknown  time  delay  (parameterized  by  the  translation 
variable),  a  recording  of  a  speech  signal  (parameterized  by  the  underlying  phonemes  spoken  by  the 
speaker),  and  an  image  of  a  3-D  object  at  an  unknown  location  captured  from  an  unknown  viewing 
angle  (parameterized  by  the  3-D  coordinates  of  the  object  and  its  roll,  pitch,  and  yaw).  In  these 
and  many  other  cases,  the  geometry  of  the  signal  class  forms  a  nonlinear  /\ -dimensional  manifold 
inRN, 

M  =  {f(6):deO},  (1) 

where  0  is  the  /\ -dimensional  parameter  space  [1-3].  Low-dimensional  manifolds  have  also  been 
proposed  as  approximate  models  for  nonparametric  signal  classes  such  as  images  of  human  faces 
or  handwritten  digits  [4-6]. 

In  many  scenarios,  multiple  observations  of  the  same  event  may  be  performed  simultaneously, 
resulting  in  the  acquisition  of  multiple  manifolds  that  share  the  same  parameter  space.  For  ex¬ 
ample,  sensor  networks  —  such  as  camera  networks  or  microphone  arrays  —  typically  observe 
a  single  event  from  a  variety  of  vantage  points,  while  the  underlying  phenomenon  can  often  be 
described  by  a  set  of  common  global  parameters  (such  as  the  location  and  orientation  of  the  ob¬ 
jects  of  interest).  Similarly,  when  sensing  a  single  phenomenon  using  multiple  modalities,  such  as 
video  and  audio,  the  underlying  phenomenon  may  again  be  described  by  a  single  parameterization 
that  spans  all  modalities.  In  such  cases,  we  will  show  that  it  is  advantageous  to  model  this  joint 
structure  contained  in  the  ensemble  of  manifolds  as  opposed  to  simply  treating  each  manifold  in¬ 
dependently.  Thus  we  introduce  the  concept  of  the  joint  manifold:  a  model  for  the  concatenation  of 
the  data  vectors  observed  by  the  group  of  sensors.  Joint  manifolds  enable  the  development  of  im¬ 
proved  manifold-based  learning  and  estimation  algorithms  that  exploit  this  structure.  Furthermore, 
they  can  be  applied  to  data  of  any  modality  and  dimensionality. 

In  this  work  we  conduct  a  careful  examination  of  the  theoretical  properties  of  joint  manifolds. 
In  particular,  we  compare  joint  manifolds  to  their  component  manifolds  to  see  how  quantities  like 
geodesic  distances,  curvature,  branch  separation,  and  condition  number  are  affected.  We  then  ob¬ 
serve  that  these  properties  lead  to  improved  performance  and  noise-tolerance  for  a  variety  of  signal 
processing  algorithms  when  they  exploit  the  joint  manifold  structure,  as  opposed  to  processing  data 
from  each  manifold  separately.  We  also  illustrate  how  this  joint  manifold  structure  can  be  exploited 
through  a  simple  and  efficient  data  fusion  algorithm  that  uses  random  projections,  which  can  also 
be  applied  to  multimodal  data. 

Related  prior  work  has  studied  manifold  alignment,  where  the  goal  is  to  discover  maps  be¬ 
tween  several  datasets  that  are  governed  by  the  same  underlying  low-dimensional  structure.  Lafon 
et  al.  proposed  an  algorithm  to  obtain  a  one-to-one  matching  between  data  points  from  several 
manifold-modeled  classes  [7].  The  algorithm  first  applies  dimensionality  reduction  using  diffu¬ 
sion  maps  to  obtain  data  representations  that  encode  the  intrinsic  geometry  of  the  class.  Then,  an 
affine  function  that  matches  a  set  of  landmark  points  is  computed  and  applied  to  the  remainder  of 
the  datasets.  This  concept  was  extended  by  Wang  and  Mahadevan,  who  apply  Procrustes  analysis 
on  the  dimensionality-reduced  datasets  to  obtain  an  alignment  function  between  a  pair  of  mani¬ 
folds  [8].  Since  an  alignment  function  is  provided  instead  of  a  data  point  matching,  the  mapping 
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obtained  is  applicable  for  the  entire  manifold  rather  than  for  the  set  of  sampled  points.  In  our  set¬ 
ting,  we  assume  that  either  (i)  the  manifold  alignment  is  provided  intrinsically  via  synchronization 
between  the  different  sensors  or  (if)  the  manifolds  have  been  aligned  using  one  of  the  approaches 
described  above.  Our  main  focus  is  a  theoretical  analysis  of  the  benefits  provided  by  analyzing  the 
joint  manifold  versus  solving  our  task  of  interest  separately  on  each  of  the  manifolds  observed  by 
individual  sensors. 

This  paper  is  organized  as  follows.  Section [2] introduces  and  establishes  some  basic  properties 
of  joint  manifolds.  Section  |3]  considers  the  application  of  joint  manifolds  to  the  tasks  of  classi¬ 
fication  and  manifold  learning.  Section  0]  then  describes  an  efficient  method  for  processing  and 
aggregating  data  when  it  lies  on  a  joint  manifold,  and  Section  [5] concludes  with  discussion. 


2  Joint  manifolds 

In  this  section  we  develop  a  theoretical  framework  for  ensembles  of  manifolds  which  are  jointly 
parameterized  by  a  small  number  of  common  degrees  of  freedom.  Informally,  we  propose  a  data 
structure  for  jointly  modeling  such  ensembles;  this  is  obtained  by  concatenating  points  from  dif¬ 
ferent  ensembles  that  are  indexed  by  the  same  articulation  parameter  to  obtain  a  single  point  in 
a  higher-dimensional  space.  We  begin  by  defining  the  joint  manifold  for  the  general  setting  of 
arbitrary  topological  manifold^. 

Definition  2.1.  Let  Adi,  Mo,  ■  ■  ■ ,  Ad  j  be  an  ensemble  of  J  topological  manifolds  of  equal  dimen¬ 
sion  K.  Suppose  that  the  manifolds  are  homeomorphic  to  each  other,  in  which  case  there  exists  a 
homeomorphism  %)3  between  Ad ,  and  Mj  for  each  j.  For  a  particular  set  of  mappings 
we  define  the  joint  manifold  as 

M*  =  {{pi,P2,  ■  ■  ■  ,pj)  e  Mi  x  M2  x  •  •  •  x  Mj  :  pj  =  fi>j(pi),2  <  j  <  J}. 
Furthermore,  we  say  that  Mi,  M2,  ■  ■  ■  ,Mj  are  the  corresponding  component  manifolds. 

Notice  that  M\  serves  as  a  common  parameter  space  for  all  the  component  manifolds.  Since 
the  component  manifolds  are  homeomorphic  to  each  other,  this  choice  is  ultimately  arbitrary.  In 
practice  it  may  be  more  natural  to  think  of  each  component  manifold  as  being  homeomorphic  to 
some  fixed  A"— dimensional  parameter  space  0.  However,  in  this  case  one  could  still  define  M* 
as  is  done  above  by  defining  as  the  composition  of  the  homeomorphic  mappings  from  Adi  to  0 
and  from  0  to  Ad  j. 

As  an  example,  consider  the  one-dimensional  manifolds  in  Figured]  Figures Q] (a)  and  (b)  show 
two  isomorphic  manifolds,  where  Adi  =  (0,  27t)  is  an  open  interval,  and  Ad2  =  '■  0  G  Adi} 

where  f>2 (6)  =  (cos (6),  sin(61)),  i.e.,  Ad2  =  *S'1\(1,  0)  is  a  circle  with  one  point  removed  (so  that  it 
remains  isomorphic  to  a  line  segment).  In  this  case  the  joint  manifold  Ad*  =  { (6*,  cos(6l),  sin(61))  : 
6  e  (0,  27r) },  illustrated  in  Figure  Q]  (c),  is  a  helix.  Notice  that  there  exist  other  possible  home¬ 
omorphic  mappings  from  Adi  to  Ad2,  and  that  the  precise  structure  of  the  joint  manifold  as  a 
submanifold  of  E3  is  heavily  dependent  on  the  choice  of  this  mapping. 

'A  comprehensive  introduction  of  topological  manifolds  can  be  found  in  Boothby  [9]. 
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(a)  Adi  C  R:  line  segment  (b)  Ad2  C  R2:  circle  segment  (c)  Ad*  C  R3:  helix  segment 

Figure  1 :  A  pair  of  isomorphic  manifolds  AI 1  and  Ad2,  and  the  resulting  joint  manifold  Ad*. 


Returning  to  the  definition  of  Ad*,  observe  that  although  we  have  called  Ad*  the  joint  manifold, 
we  have  not  shown  that  it  actually  forms  a  topological  manifold.  To  prove  that  .Ad*  is  indeed  a 
manifold,  we  will  make  use  of  the  fact  that  the  joint  manifold  is  a  subset  of  the  product  manifold 
AfxAi  2  x  •  •  •  x  M  j.  One  can  show  that  the  product  manifold  forms  a  JK -dimensional  manifold 
using  the  product  topology  [9].  By  comparison,  we  now  show  that  Ad*  has  dimension  only  K. 

Proposition  2.1.  Ad*  is  a  K -dimensional  submanifold  of  Ai  \  x  Ad2  x  •  •  •  x  .Ad./. 

Proof  We  first  observe  that  since  Ad*  is  a  subset  of  the  product  manifold,  we  automatically  have 
that  Ad*  is  a  second  countable  Hausdorff  topological  space.  Thus,  all  that  remains  is  to  show 
that  Ad*  is  locally  homeomorphic  to  RA.  Let  p  —  (pi,p2,  ■  ■  ■  ,Pj )  be  an  arbitrary  point  on  Ad*. 
Since  p\  e  Adi,  we  have  a  pair  (U\,  <ff)  such  that  U\  C  Adi  is  an  open  set  containing  p\  and 
(l>\  :  £/|  — >  V  is  a  homeomorphism  where  V  is  an  open  set  in  RA .  We  now  define  for  2  <  j  <  J 
Uj  =  fj(Ui)  and  o3  =  (p\  o  iff1  :  If  — >  V.  Note  that  for  each  j,  U 3  is  an  open  set  and  o3  is  a 
homeomorphism  (since  tfj  is  a  homeomorphism). 

Now  define  U*  =  (Ui  x  f/2  x  •  •  •  x  Uj)  fl  Ad*.  Observe  that  U*  is  an  open  set  and  thatp  e  U*. 
Furthermore,  let  q  =  (q\ .  q2, . . . ,  qf)  be  any  element  of  U*.  Then  of  qf)  =  0\  °  ff'  (qf)  =  0\  fp ) 
for  each  2  <j<  J .  Thus,  since  the  image  of  each  q3  e  Uj  in  V  under  their  corresponding  o3  is 
the  same,  we  can  form  a  single  homeomorphism  </>*  :  U*  — *  V  by  assigning  <f*(q)  =  This 

shows  that  Ad*  is  locally  homeomorphic  to  RA  as  desired.  □ 

Since  Ad*  is  a  submanifold  of  Adi  x  Ad2  x  •  •  •  x  Ad  j,  it  also  inherits  some  desirable  properties 
from  its  component  manifolds. 

Proposition  2.2.  Suppose  that  Adi,  Ad2, ...  Ad  j  are  isomorphic  topological  manifolds  and  Ai*  is 
defined  as  above. 

1.  If  Ai  \ .  At). ....  Ai  j  are  Riemannian,  then  At*  is  Riemannian. 

2.  If  Ai  i,  Ad2, . . . ,  Ad  j  are  compact,  then  Ad*  is  compact. 
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Proof.  The  proofs  of  these  facts  are  straightforward  and  follow  from  the  fact  that  if  the  component 
manifolds  are  Riemannian  or  compact,  then  the  product  manifold  will  be  as  well.  Ad*  then  inherits 
these  properties  as  a  submanifold  of  the  product  manifold  [9].  □ 

Up  to  this  point  we  have  considered  general  topological  manifolds.  In  particular,  we  have  not 
assumed  that  the  component  manifolds  are  embedded  in  any  particular  space.  If  each  compo¬ 
nent  manifold  A ij  is  embedded  in  M.Nj,  the  joint  manifold  is  naturally  embedded  in  R;V  ’  where 
N*  =  Y2j= 1  Nj.  Hence,  the  joint  manifold  can  be  viewed  as  a  model  for  data  of  varying  ambient 
dimension  linked  by  a  common  parametrization.  In  the  sequel,  we  assume  that  each  manifold  Adj 
is  embedded  in  lA,  which  implies  that  Ad*  C  MJAr.  Observe  that  while  the  intrinsic  dimension  of 
the  joint  manifold  remains  constant  at  K ,  the  ambient  dimension  increases  by  a  factor  of  J.  We 
now  examine  how  a  number  of  geometric  properties  of  the  joint  manifold  compare  to  those  of  the 
component  manifolds. 

We  begin  with  the  following  simple  observation  that  Euclidean  distances  between  points  on 
the  joint  manifold  are  larger  than  distances  on  the  component  manifolds.  In  the  remainder  of  this 
paper,  whenever  we  use  the  notation  ||  •  ||  we  mean  ||  •  ||^2,  i.e.,  the  i2  (Euclidean)  norm  on  W.N. 
When  we  wish  to  differentiate  this  from  other  iv  norms,  we  will  be  explicit. 

Proposition  2.3.  Let  p  =  (pi,P2,  ■  ■  ■  ,Pj )  and  q  =  (gy,  q2, . . . ,  qj)  be  two  points  on  the  joint 
manifold  Ad* .  Then 

j 

\\p-q\\  =  \  EiiPi-rf- 
\  i=1 

Proof  This  follows  from  the  definition  of  the  Euclidean  norm: 

JN  J  N  J 

\\p  -  q\\2  =  -  q(j))2  =  X]  ~  Sj(®))2  = 

i=  1  j= 1  2=1  j= 1 


□ 

While  Euclidean  distances  are  important  (especially  when  noise  is  introduced),  the  natural 
measure  of  distance  between  a  pair  of  points  on  a  Riemannian  manifold  is  not  Euclidean  distance, 
but  rather  the  geodesic  distance.  The  geodesic  distance  between  points  p,  q  e  M  is  defined  as 

dM(p,q )  =  inf{L(7)  :  y(0)  =  p,  7(1)  =  q},  (2) 

where  7  :  [0, 1]  — ^  is  a  U1-smooth  curve  joining  p  and  q,  and  L( 7)  is  the  length  of  7  as 

measured  by 

L<dl)  =  [  \\d(t)\\dt.  (3) 

Jo 

In  order  to  see  how  geodesic  distances  on  Ad*  compare  to  geodesic  distances  on  the  component 
manifolds,  we  will  make  use  of  the  following  lemma. 
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Lemma  2.1.  Suppose  that  M\,  M2,  •  •  • ,  Mj  are  Riemannian  manifolds,  and  let  7  :  [0, 1]  ->  M* 
be  a  C1- smooth  curve  on  the  joint  manifold.  Then  we  can  write  7  =  (71, 72, ... ,  7  f)  where  each 
7 j  :  [0, 1]  — >  Mj  is  a  C1- smooth  curve  on  Mj,  and 


7ri^L(dh)  <  L(l)  <X>(7,)- 

v  J  j  =  1  j= 1 

Proof  We  begin  by  observing  that 


IItWII^  = 


(4) 


For  a  fixed  £,  let  Xj  =  ||t?(£)||,  and  observe  that  (x\,  x2, . . . ,  xj)  is  a  vector  in  MJ.  Thus  we  may 
apply  the  standard  norm  inequalities 


Vj 


Mk  <  \\x\\e2  <  ||x||^ 


to  obtain 


^EHi(t)ii<xEii7i(t)ii2<E 

V  J  j= 1  \  j=i  i=i 

Combining  the  right-hand  side  of  ©  with  ©  we  obtain 


L( 7)  <  [  [  \\ij(t)\\dt  =  '52L('yj)- 


'°  i=i 


i=i 


i=i 


Similarly,  from  the  left-hand  side  of  ©  we  obtain 

-I-,  J  J  rl 


l(t)  > 


l  7jEifewii*  =  77 gi  ii7,(i)ii<ii  =  ^Ei(7, 


(5) 

(6) 


□ 

We  are  now  in  a  position  to  compare  geodesic  distances  on  M*  to  those  on  the  component 
manifold. 


Theorem  2.1.  Suppose  that  M\,  M2,  ■  ■  ■ ,  Mj  are  Riemannian  manifolds.  Letp  =  (p  1 ,  p2 , . . .  ,pj ) 
and  q  =  (q\,  q2,  •  •  • ,  qj)  be  two  points  on  the  corresponding  joint  manifold  M*.  Then 


dM*(p,q)  > 


(7) 
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If  the  mappings  3,  ■■■  are  isometries,  i.e.,  dMl(pi,qi)  =  dMj('fij(pi),'fij(qi))  for  any  j 
and  for  any  pair  of  points  (p,  q),  then 


dM*  (p,  q) 


1 

7 J 


j 

dMj(pj,  qf)  =  VI  ■  dMl(pi,qi). 
i= i 


Proof  If  7  is  a  geodesic  path  between  p  and  q,  then  from  Lemma  [27T1 


(8) 


1  x  ^ 

dM*(P:  d)  =L{  7)  > 

v  J  j= 1 

By  definition  L(7j)  >  dMj(pj,  qf;  hence,  this  establishes  ©. 

Now  observe  that  lower  bound  in  Lemma [27TI is  derived  from  the  lower  inequality  of  ©.  This 
inequality  is  attained  with  equality  if  and  only  if  each  term  in  the  sum  is  equal,  i.e.,  L( yfi  =  L(pk) 
for  all  j  and  k.  This  is  precisely  the  case  when  L’s, . . . ,  V'j  are  isometries.  Thus  we  obtain 

dM*(p,q)  =L{  7)  =  -fjYl  L(lj)  =  VJL(  7l). 

v  J  j= 1 


We  now  conclude  that  £(71)  =  dMl(pi,  qf  since  if  we  could  obtain  a  shorter  path  71  from  p\  to 
qi  this  would  contradict  the  assumption  that  7  is  a  geodesic  on  M*,  which  establishes  ([8]).  □ 

Next,  we  study  local  smoothness  and  global  self  avoidance  properties  of  the  joint  manifold 
using  the  notion  of  condition  number. 

Definition  2.2.  [10]  Let  J\A  be  a  Riemannian  submanifold  of  E x .  The  condition  number  is 
defined  as  1/r,  where  r  is  the  largest  number  satisfying  the  following:  the  open  normal  bundle 
about  M  of  radius  r  is  embedded  in  RN  for  all  r  <  r. 

The  condition  number  of  a  given  manifold  controls  both  local  smoothness  properties  and  global 
properties  of  the  manifold.  Intuitively,  as  1/r  becomes  smaller,  the  manifold  becomes  smoother 
and  more  self-avoiding.  This  is  made  more  precise  in  the  following  lemmata. 

Lemma  2.2.  [10]  Suppose  A4  has  condition  number  1/r.  Let  p,q  e  A4  be  two  distinct  points  on 
M,  and  let  p(t  )  denote  a  unit  speed  parameterization  of  the  geodesic  path  joining  p  and  q.  Then 

max  ||7(f)  ||  <  7 

Lemma  2.3.  [10]  Suppose  M.  has  condition  number  1/r.  Let  p,  q  e  M.  be  two  points  on  A4  such 
that  ||p  —  g||  =  d.  If  d  <  r/2,  then  the  geodesic  distance  dj^(p,  q)  is  bounded  by 

dM(p,q)  <  t(1  -  V1  ~  2d/r). 
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We  wish  to  show  that  if  the  component  manifolds  are  smooth  and  self  avoiding,  the  joint  man¬ 
ifold  is  as  well.  It  is  not  easy  to  prove  this  in  the  most  general  case,  where  the  only  assumption  is 
that  there  exists  a  homeomorphism  (i.e.,  a  continuous  bijective  map  if)  between  every  pair  of  man¬ 
ifolds.  However,  suppose  the  manifolds  are  diffeomorphic,  i.e.,  there  exists  a  continuous  bijective 
map  between  tangent  spaces  at  corresponding  points  on  every  pair  of  manifolds.  In  that  case,  we 
make  the  following  assertion. 

Theorem  2.2.  Suppose  that  Adi,  M2,  ■  ■  ■  ,Mj  are  Riemannian  submanifolds  oflM,  and  let  1  / r? 
denote  the  condition  number  of  Mj.  Suppose  also  that  the  if2,  ^3,  ■  ■  ■  ,'dj  that  define  the  corre¬ 
sponding  joint  manifold  M*  are  diffeomorphisms.  Ifl/r*  is  the  condition  number  of  M* ,  then 

r*  >  min  T,-, 
i <j<J  J 

or  equivalently, 

1  1 
—  <  max  — . 

T*  1  <j<J  Tj 

Proof  Let  p  E  M*,  which  we  can  write  as  p  =  (pi,p2, . . .  ,pj )  with  pj  E  Mj.  Since  the 
{f>j}j= 2  are  diffeomorphisms,  we  may  view  M*  as  being  diffeomorphic  to  Mp,  i.e.,  we  can  build 
a  diffeomorphic  map  from  M 1  to  A'l*  as 

P  =  V{Pi)  ■=  (PuMP2),- 

We  also  know  that  given  any  two  manifolds  linked  by  a  diffeomorphism  if  :  A4,  — >  Mj, 
each  vector  v\  in  the  tangent  space  Tj  (/q)  of  the  manifold  Mi  at  the  point  p1  is  uniquely  mapped 
to  a  tangent  vector  Vj  :=  (fj{vf)  in  the  tangent  space  Tfpf  of  the  manifold  Mj  at  the  point 
Pj  =  fj (pi )  through  the  map  <j>j  :=  J  o  'ifj(p\)  ,  where  J  denotes  the  Jacobian  operator. 

Consider  the  application  of  this  property  to  the  diffeomorphic  manifolds  Mi  and  M*.  In  this 
case,  the  tangent  vector  V\  6  7j  (pf)  to  the  manifold  M±  can  be  uniquely  identified  with  a  tangent 
vector  v  =  4>*{vf)  E  T*(p )  to  the  manifold  M*.  This  mapping  is  expressed  as 

=  jo  =  (ui,  j  o  ip2(pi),  ■  ■  ■ ,  J  °  fo(pi)), 

since  the  Jacobian  operates  componentwise.  Therefore,  the  tangent  vector  v  can  be  written  as 

V  =  0*(ui)  =  (Ti,02(Wi),---,0j(Pi)), 

=  (vuv2 

In  other  words,  a  tangent  vector  to  the  joint  manifold  can  be  decomposed  into  J  component  vectors, 
each  of  which  are  tangent  to  the  corresponding  component  manifolds. 

Using  this  fact,  we  now  show  that  a  vector  q  that  is  normal  to  M*  can  also  be  broken  down  into 
sub-vectors  that  are  normal  to  the  component  manifolds.  Consider  p  E  M*,  and  denote  T*  (p) x  as 
the  normal  space  at  p.  Suppose  q  =  (qi, . . . ,  qj)  E  T*(p)x.  Decompose  each  qj  as  a  projection 
onto  the  component  tangent  and  normal  spaces,  i.e.,  for  j  =  1, . . . ,  J, 

Vj  =  Xj  +  yj ,  Xj  E  Tj  ( pj ) ,  yjE  Tj  ( pj ) x . 
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Figure  2:  Point  at  which  the  normal  bundle  for  the  helix  manifold  intersects  itself 


such  that  (xj,  yj)  =  0  for  each  j.  Let  x  —  (xi, . . . ,  xj)  and  y  =  (y  1,  ■  •  • ,  yj).  Then  y  =  x  +  y,  and 
since  y  is  tangent  to  the  joint  manifold  M*,  we  have  (y,  y)  —  (x  +  y ,  x)  =  0,  and  thus 


2 


(y,x)  =  - 


X 


But, 


j 


Hence  x  —  0,  i.e.,  each  ry  is  normal  to  Mj. 

Armed  with  this  last  fact,  our  goal  now  is  to  show  that  if  r  <  min^xj  r?  then  the  normal 
bundle  of  radius  r  is  embedded  in  M.N,  or  equivalently,  that  p  +  y  ^  q  +  v  provided  that  INI,  IMI  < 
r.  Indeed,  suppose  ||r/||,  ||i/||  <  r  <  mini <j<jTj.  Since  \\y j||  <  ||r/||  and  ll^ll  <  M  for  all 
1  <  j  <  J,  we  have  that  1 1 r/;- 1 1 ,  \\irj\\  <  min r,  <  Tj.  Since  we  have  proved  that  y3 ,  v:)  are 
vectors  in  the  normal  bundle  of  A43  and  their  magnitudes  are  less  than  rv  then  p3  +  ry  ^  q:/  +  v,} 
by  the  definition  of  condition  number.  Thus  p  +  y  ^  q  +  v  and  the  result  follows.  □ 

This  result  states  that  for  general  manifolds,  the  most  we  can  say  is  that  the  condition  number 
of  the  joint  manifold  is  guaranteed  to  be  less  than  that  of  the  worst  manifold.  However,  in  practice 
this  is  not  likely  to  happen.  As  an  example,  Figure  [2]  illustrates  the  point  at  which  the  normal 
bundle  intersects  itself  for  the  case  of  the  joint  manifold  from  Figure  Q](c).  In  this  case  we  obtain 
t*  =  \J tt‘-/2  +  1.  Note  that  the  condition  numbers  for  the  manifolds  A4 1  and  M.2  generating  AT* 
are  given  by  T\  =  oo  and  r2  =  1.  Thus,  while  the  condition  number  in  this  case  is  not  as  good  as 
the  best  manifold,  it  is  still  notably  better  than  the  worst  manifold.  In  general,  even  this  example 
may  be  somewhat  pessimistic,  and  it  is  possible  that  in  many  cases  the  joint  manifold  may  be  better 
conditioned  than  even  the  best  manifold. 
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3  Joint  manifolds  in  signal  processing 

Manifold  models  can  be  exploited  by  a  number  of  algorithms  for  signal  processing  tasks  such 
as  pattern  classification,  learning,  and  control  [11].  The  performance  of  such  algorithms  often 
depends  on  geometric  properties  of  the  manifold  model  such  as  its  condition  number  and  geodesic 
distances  along  its  surface.  The  theory  developed  in  Section  [2]  suggests  that  the  joint  manifold 
preserves  or  improves  these  properties.  We  will  now  see  that  when  noise  is  introduced  these  results 
suggest  that,  in  the  case  of  multiple  data  sources,  it  can  be  extremely  beneficial  to  use  algorithms 
specifically  designed  to  exploit  the  joint  manifold  structure. 

3.1  Classification 

We  first  study  the  problem  of  manifold-based  classification.  The  problem  is  defined  as  follows: 
given  manifolds  Ad  and  J\f,  suppose  we  observe  a  signal  y  =  x  +  n  G  R/V  where  either  x  G  A4  or 
x  G  J\f  and  n  is  a  noise  vector,  and  we  wish  to  find  a  function  /  :  WN  — >  {Ad,  A/"}  that  attempts 
to  determine  which  manifold  “generated”  y.  We  consider  a  simple  classification  algorithm  based 
on  the  generalized  maximum  likelihood  framework  described  in  [12].  The  approach  is  to  classify 
by  computing  the  distance  from  the  observed  signal  y  to  each  of  the  manifolds,  and  then  classify 
based  on  which  of  these  distances  is  smallest,  i.e.,  our  classifier  is 

f{y)  =  argmin[%,  M),d(y,N)\.  (9) 

We  will  measure  the  performance  of  this  algorithm  for  a  particular  pair  of  manifolds  by  considering 
the  probability  of  misclassifying  a  point  from  Ad  as  belonging  to  A f,  which  we  denote  Pmn- 
To  analyze  this  problem,  we  employ  three  common  notions  of  separation  in  metric  spaces: 

•  The  minimum  separation  distance  between  two  manifolds  Ad  and  M  is  defined  as 

8{M,M)  —  inf  d(p,AT). 

peM 


•  The  Hausdorjf  distance  from  Ad  to  J\f  is  defined  to  be 

D(M,Af)  =  sup  d(p,J\f), 

pGM 

with  D(J\f,  M.)  defined  similarly.  Note  that  5(A4,J\f)  =  8(J\f,  Ad),  while  in  general 

D(M,Af)  ^  D(Af,  M). 

•  The  maximum  separation  distance  between  manifolds  Ad  and  J\f  is  defined  as 

A  (Ad,  A/”)  =  sup  sup  ||x  —  y  || . 
xeAt  y&jy 

As  one  might  expect,  PmN 's  controlled  by  the  separation  distances.  For  example,  suppose  that 
x  G  Ad;  if  the  noise  vector  n  is  bounded  and  satisfies  ||n||  <  <5(Ad,  A/”)/2,  then  we  have  that 
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d(y,M)<  ||n||  <  5 ( M ,  A/") /2  and  hence 


5(M,  Af) 


Thus  we  are  guaranteed  that 

d(y,Af)  >  6(M,Af)/2. 

Therefore,  d(y,  M)  <  d(y,  Af)  and  the  classifier  defined  by  ©  satisfies  PmM  =  0.  We  can  refine 
this  result  in  two  possible  ways.  First,  note  that  the  amount  of  noise  e  that  we  can  tolerate  without 
making  an  error  depends  on  x.  Specifically,  for  a  given  x  G  Ai,  provided  that  ||n||  <  d(x,  Af)/2  we 
still  have  that  Pmn  =  0.  Thus,  for  a  given  x  G  Ai  we  can  tolerate  noise  bounded  by  d(x,  Af)/2  G 
[8{M,M)/^,D{M,M)/2\. 

A  second  possible  refinement  that  we  will  explore  below  is  to  ignore  this  dependence  of  x,  but 
to  extend  our  noise  model  to  the  case  where  ||n||  >  d(A4.Af)/2  with  non-zero  probability.  We  can 
still  bound  Pmn  since 

PMN<P{\\n\\  >8{M,Af)/2).  (10) 

We  provide  bounds  on  this  probability  for  both  the  component  manifolds  and  the  joint  manifold 
as  follows:  first,  we  first  compare  the  separation  distances  for  these  cases. 

Theorem  3.1.  Consider  the  joint  manifolds  Ai*  c  Aii  x  Ai2  x  •  •  •  x  A4j  and  A f*  C  Af\  x  J\f2  x 
•  •  •  x  Afj.  Then,  the  following  bounds  hold: 

1.  Joint  minimum  separation: 

^262(Mj,A fj)  <  52(Ai*,Af*)  <  mm  (52(A4klAfk)  +  ^  A2(Mj, Afj) )  .  (11) 

3= 1 


<  D2(M*,Af*)  <J2A 2(MpAfj).  (12) 

3  = 1 


J 

<A2(A4*,Af*)<J2A2(Mj^Afj)-  (13) 

3  = 1 


2.  Joint  Hausdorff  separation  from  Ai*  toAf*: 


max  (  D2(Mk,Afk)  +  J2s2(M3,Afj) 


3.  Joint  maximum  separation  from  Ai*  to  AT*: 


max^  |  A2(Mk,Afk)  +  52(Mj,Afj) 


j¥=k 


peM,qeJ\f 

=  jnf  \\p  -  y  +  y  -  g|| 

peM,qeJ\f 

<  jnf  J\p-y\\  +  \\y-q\\ 

peM,qeJ\f 

=  d(y,M)  +  d(y,J\f) 

<  6(M,Af)/2  +  d(y,Af). 
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Proof.  Inequality  (fill)  is  a  simple  corollary  of  Proposition  12.31  Let  p  =  (pi,p2, . . .  ,pj)  and 
q  —  (qi,  q2, . . . ,  qj)  respectively  be  the  points  on  Ad*  and  J\f*  for  which  the  minimum  separa¬ 
tion  distance  is  attained,  i.e., 


(p,  4)  =  arg  inf  inf  \\p  -  q\\ 
peM*  qeJ\f* 


Then, 


52(M*,Af*)  =  \\p-q\\2  =  '%2\\Pj-qj\\2 

3= 1 
J 

> 

3= 1 

since  the  distance  between  two  points  in  any  given  component  space  is  greater  than  the  minimum 
separation  distance  corresponding  to  that  space.  This  establishes  the  lower  bound  in  (fill).  We 
obtain  the  upper  bound  by  selecting  a  k,  and  selecting  p  e  M*  and  q  e  J\f*  such  that  pk  and  qk 
attain  the  minimum  separation  distance  5(M.klJ\fk )■  From  the  definition  of  d(A4*,Af*),  we  have 
that 


<  \\p-  q\\2  =  J2\\pj  ~  qj\\2 

3  = 1 

=  52(Mk,Afk)  +  \\Pj  ~  ^ll2 

<  52(Mk,Afk)  +  J2A2(Mj^j)’ 

and  since  this  holds  for  every  choice  of  k,  (fTTT)  follows  by  taking  the  minimum  over  all  k. 

To  prove  inequality  (fl2l).  we  follow  a  similar  course.  We  begin  by  selecting  p  e  Ai*  and 
q  G  A f*  that  satisfy 

(P,q)  =  arS  sup  inf  ||p  -  g||. 

p&M*  9SAT 

Then, 

j 

=  iip-9||2  =  Eii»-®h2 

3= 1 

j=i 

which  establishes  the  upper  bound  in  (fl2l).  To  obtain  the  lower  bound,  we  again  select  a  k,  and 
now  let  p  G  Ad*  be  the  point  for  which  the  corresponding  at  which  the  Hausdorff  separation  for 
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the  component  manifold  Mk  is  attained,  i.e.,  the  corresponding  point  pk  is  furthest  away  from  J\Tk 
as  can  be  possible  in  M.k.  Let  q  E  J\f*  be  the  nearest  point  in  J\f*  to  p.  From  the  definition  of  the 
Hausdorff  distance,  we  get  that 

D(M",K‘)>\\p-q\\, 

since  the  Hausdorff  distance  is  the  maximal  distance  between  the  points  in  A4*  and  their  respective 
nearest  neighbors  in  M* .  Again,  it  also  follows  that 

D2(M* ,JV*)  >  \\p-q\\2  =  \\pk~  qk\\2 +  ^2\\Pj  ~  qj\\2 

o+k 

=  D2(Mt,  A4)  +  ^||p,-%||2 

>  D2(Mk,Afk)  +  J2s2(Mj’Mj)- 

Since  this  again  holds  for  every  choice  of  k,  (fl2l)  follows  by  taking  the  maximum  over  all  k. 

One  can  prove  (fl3l)  using  the  same  technique  used  to  prove  (fill).  □ 


As  an  example,  if  we  consider  the  case  where  the  separation  distances  are  constant  for  all  j, 
then  the  joint  minimum  separation  distance  satisfies 

y/J6(M1,M'1)<6(M*,Ar*)  <  y/52(MuAfi)  +  (A  -  l)A2(A^i,M) 

<  6{Mi,Mi)  +  y/T=lA{Mi,Ni) 


In  the  case  where  5 (M 1 ,  A/j )  -C  A (M 1  ,A/[)  then  we  observe  that  5  (M  * ,  AT* )  can  be  considerably 
larger  than  This  means  that  we  can  potentially  tolerate  much  more  noise  while 

ensuring  Pm*N*  =  0-  To  see  this,  write  n  =  (ni,  n2, . . . ,  nj)  and  recall  that  we  require  1 1 n j  1 1  < 
e  =  8(Mj,Afj)/2  to  ensure  that  PmjJVj  =  0.  Thus,  if  we  require  that  Pm^M,  =  0  for  all  j,  then  we 
have  that 

j 


n  = 


\ 


1 1 ?7- j  1 1 2  <  Vie  =  VJ5{M1,Af1)/2. 


3= 1 


However,  if  we  instead  only  require  that  Pm*jv*  =  0  we  only  need  ||n||  <  ,Af*)/2,  which 

can  be  a  significantly  less  stringent  requirement. 

The  benefit  of  classification  using  the  joint  manifold  is  made  more  apparent  when  we  extend 
our  noise  model  to  the  case  where  we  allow  1 1 \ \  >  5(A4j,J\fj)/2  with  non-zero  probability  and 
apply  (fTOb.  To  bound  the  probability  in  (flOl).  we  will  make  use  of  the  following  adaptation  of 
Hoeffding’s  inequality  [13]. 


Lemma  3.1.  Suppose  thatrij  E  RN  is  a  random  vector  that  satisfies  \ \ n.j  \  <  e,forj  =  1,  2, . . . ,  J. 
Suppose  also  that  the  n3  are  independent  and  identically  distributed  (i.i.d.)  with  E\\\ n3  \ \ }  =  a. 
Then  ifn  =  (ni,  n2, . . . ,  rtj)  E  RJN,  we  have  that  for  any  A  >  0, 


P  ( || 77-H 2  >  J(a2  +  A))  <  exp 
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Using  this  lemma  we  can  relax  the  assumption  on  e  so  that  we  only  require  that  it  is  finite, 
and  instead  make  the  weaker  assumption  that  77 [||n||]  =  \f~Ja  <  6(M,  J\f) /2  for  a  particular  pair 
of  manifolds  AL  J\f.  This  assumption  ensures  that  A  =  <52(Af,  A/*)  / 4  —  a2  >  0,  so  that  we  can 
combine  Lemma  13.11  with  (flOl)  to  obtain  a  bound  on  PmN-  Note  that  if  this  condition  does  not 
hold,  then  this  is  a  very  difficult  classification  problem  since  the  expected  norm  of  the  noise  is 
large  enough  to  push  us  closer  to  the  other  manifold,  in  which  case  the  simple  classifier  given  by 
©  makes  little  sense. 

We  now  illustrate  how  Lemma  [3TTI  can  be  be  used  to  compare  error  bounds  between  classi¬ 
fication  using  a  joint  manifold  and  classification  using  a  particular  pair  of  component  manifolds 

MkMk- 


Theorem  3.2.  Suppose  that  we  observe  a  vector  y  =  x+n  where  x  G  M*  andn  =  (ni,n2, . . .  ,nj ) 
is  a  random  vector  such  that  \ \ n3  \  <  e,  for  j  =  1,2, . . .,  J,  and  that  the  rij  are  i.i.d.  with 
E[\\nj\\\  =  a<5(Mk,^k)/2.If 


S(Mk,ffk)  < 


7J 


and  we  classify  the  observation  y  according  to  ([£]),  then 


(14) 


Pm*M*  —  exP 


(15) 


and 


such  that 


PMkMk  <  exp 


s|e  « 

C  >  Cfc. 


(16) 


Proof  First,  observe  that 


82(M*,J\f* 

J 


>82(Mk,Nk)  >4o2 


(17) 


Thus,  we  may  set  A  =  82(M*,  Af*)/4J  —  a2  >  0  and  apply  Lemma l3Tl to  obtain  (fl5l)  with 


c*  =  J 


f82(M*,ff* 


4J 


—  o 


Similarly,  we  may  again  apply  Lemma l3Tl by  setting  A  =  82(Mj,  A/})/4  —  a2  >  0  and  J  =  1  to 
obtain  (fl6l)  with 


Ck 


(P(Mt,Uk)  _2 


l 


—  cr 
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It  remains  to  show  that  c*  >  ck .  Thus,  observe  that 

82(M*,N*) 


82(Mk,ffk)  < 


J 


< 


VJ52{M\M*)  -  (VJ-  l)82(M*,N*) 
J 

52{M*,N*)  ,  n  1,82(M*, A/"*) 

- - (VJ-i) - 

82(M*,AT* 

7 T~ 


J 


—  A<j2(Vj  —  1), 


where  the  last  inequality  follows  from  (fT71).  Rearranging  terms,  we  obtain 


82(Mk,Afk) 


a2  <  VJ 


82(M*,AT* 
4 J 


a 


Thus, 


V^k  <  w, 

and  since  ck>  0  by  assumption,  we  obtain 

/  =(= 

Ck  —  C  ? 

as  desired.  □ 

This  result  can  be  weakened  slightly  to  obtain  the  following  corollary. 

Corollary  3.1.  Suppose  that  we  observe  a  vector  y  =  x+n  where  x  E  M*  andn  =  (ni,n2, . . .  ,nj ) 
is  a  random  vector  such  that  \ n3 \  <  e,  for  j  =  1,2, . . . ,  J  and  that  the  rij  are  i.i.d.  with 

E[\\nj\\]  =  a<8(MkM)/2.If 


82(Mk,Afk)  < 


J  -l 


(18) 


and  we  classify  according  to  ©,  then  1731)  and  4761)  hold  with  the  same  constants  as  in  Theorem 

\m 

Proof  We  can  rewrite  (fl8l)  as 

.,w  v;  ,  82{Mj..\'j)  -tHMk.Xk) 

8  (Mk,JVk)  <  — - 


J-  1 


After  rearranging  terms,  this  reduces  to 


8  {Mk,Nk)  < 


J 


Applying  (fill)  from  Theorem  [3711  we  obtain 

82(Mk,Afk)  < 


82(M*,Af*) 

J  : 


which  allows  us  to  apply  Theorem  I3l2l  to  prove  the  desired  result. 


□ 
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Corollary  13.11  shows  that  we  can  expect  joint  classification  to  outperform  the  k- th  individ¬ 
ual  classifier  whenever  the  squared  separation  distance  for  the  k- th  component  manifolds  is  not 
too  much  larger  than  the  average  squared  separation  distance  among  the  remaining  component 
manifolds.  Thus,  we  can  expect  that  the  joint  classifier  is  outperforming  most  of  the  individual 
classifiers,  but  it  is  still  possible  that  some  of  the  individual  classifiers  are  doing  better.  Of  course, 
if  one  were  able  to  know  in  advance  which  classifiers  were  best,  then  one  would  only  use  data 
from  the  best  sensors.  We  expect  that  a  more  typical  situation  is  when  the  separation  distances  are 
(approximately)  equal  across  all  sensors,  in  which  case  the  condition  in  (fl8l)  is  true  for  all  of  the 
component  manifolds. 

3.2  Manifold  learning 

In  contrast  to  the  classification  scenario  described  above,  where  we  knew  the  manifold  structure  a 
priori ,  we  now  consider  manifold  learning  algorithms  that  attempt  to  learn  the  manifold  structure 
by  constructing  a  (possibly  nonlinear)  embedding  of  a  given  point  cloud  into  a  subset  of  ML,  where 
L  <  N .  Typically,  L  is  set  to  K,  the  intrinsic  manifold  dimension.  Several  such  algorithms  have 
been  proposed,  each  giving  rise  to  a  nonlinear  map  with  its  own  special  properties  and  advantages 
(e.g.  Isomap  [14],  Locally  Linear  Embedding  (LLE)  [15],  Hessian  Eigenmaps  [16],  etc.)  Such 
algorithms  provide  a  powerful  framework  for  navigation,  visualization  and  interpolation  of  high¬ 
dimensional  data.  For  instance,  manifold  learning  can  be  employed  in  the  inference  of  articulation 
parameters  (eg.,  3-D  pose)  of  points  sampled  from  an  image  appearance  manifold. 

In  particular,  the  Isomap  algorithm  deserves  special  mention.  It  assumes  that  the  point  cloud 
consists  of  samples  from  a  data  manifold  that  is  (at  least  approximately)  isometric  to  a  convex 
subset  of  Euclidean  space.  In  this  case,  there  exists  an  isometric  mapping  /  from  a  parameter 
space  0  C  Ma  to  the  manifold  A4  such  that  the  geodesic  distance  between  every  pair  of  data 
points  is  equal  to  the  i2  distance  between  their  corresponding  pre-images  in  0.  In  essence,  Isomap 
attempts  to  discover  the  coordinate  structure  of  that  A"-dimensional  space. 

Isomap  works  in  three  stages: 

•  We  construct  a  graph  G  that  contains  one  vertex  for  each  input  data  point;  an  edge  connects 
two  vertices  if  the  Euclidean  distance  between  the  corresponding  data  points  is  below  a 
specified  threshold. 

•  We  weight  each  edge  in  the  graph  G  by  computing  the  Euclidean  distance  between  the 
corresponding  data  points.  We  then  estimate  the  geodesic  distance  between  each  pair  of 
vertices  as  the  length  of  the  shortest  path  between  the  corresponding  vertices  in  the  graph  G. 

•  We  embed  the  points  in  MA  using  multidimensional  scaling  (MDS),  which  attempts  to  embed 
the  points  so  that  their  Euclidean  distance  approximates  the  geodesic  distances  estimated  in 
the  previous  step. 

A  crucial  component  of  the  MDS  algorithm  is  a  suitable  linear  transformation  of  the  matrix  of 
squared  geodesic  distances  D\  the  rank- A"  approximation  of  this  new  matrix  yields  the  best  pos¬ 
sible  A'-dimcnsional  coordinate  structure  of  the  input  sample  points  in  a  mean-squared  sense. 
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Further  results  on  the  performance  of  Isomap  in  terms  of  geometric  properties  of  the  underlying 
manifold  can  be  found  in  [17]. 

We  examine  the  performance  of  manifold  learning  using  Isomap  with  samples  of  the  joint 
manifold ,  as  compared  to  learning  any  of  the  component  manifolds.  We  first  assume  that  we  are 
given  noiseless  samples  from  the  J  isometric  component  manifolds  A4x,  A42,  ■  ■  ■ ,  M  j.  In  order 
to  judge  the  quality  of  the  embedding  learned  by  the  Isomap  algorithm,  we  will  observe  that  for 
any  pair  of  points  p,q  on  a  manifold  Ai,  we  have  that 


P  < 


lb  -  gll 

dM(p,q) 


<  l 


(19) 


for  some  p  £  [0, 1]  that  will  depend  on  p,  q.  Isomap  will  perform  well  if  the  largest  value  of  p  that 
satisfies  (fl9l)  for  any  pair  of  samples  that  are  connected  by  an  edge  in  the  graph  G  is  close  to  1. 
Using  this  result,  we  can  compare  the  performance  of  manifold  learning  using  Isomap  on  samples 
from  the  joint  manifold  A4*  to  using  Isomap  on  samples  from  a  particular  component  manifold 
Mk. 


Theorem  3.3.  Let  .A/p  be  a  joint  manifold  from  J  isometric  component  manifolds.  Let  p  = 
(Pi,P2,  ■  ■  ■  ,Pj)  and  q  =  (qi,  q2,  ■  ■  ■  ,  qj)  denote  a  pair  of  samples  of  Ad*  and  suppose  that  we 
are  given  a  graph  G  that  contains  one  vertex  for  each  sample.  For  each  k  —  1, . . . ,  J,  define  pj  as 
the  largest  value  such  that 


Pj  < 


Ibj-gjll 

dMj{Pji  Qj) 


<  1 


(20) 


for  all  pairs  of  points  connected  by  an  edge  in  G.  Then  we  have  that 


lb-g|l 

dM*(p,q) 


<  l. 


Proof.  By  Proposition  12.31 

j 

lb-0ll2  =  X)lbj-rf» 

3= 1 

and  from  Theorem  12.  II  we  have  that 


d2M*(p,q )  =  Jd2Ml(pi,qi). 


(21) 


17 


Thus, 


lb-gll2  =  Ej=  1 1 \pj  -  qj II2 

d2M*(p,q )  Jd^ip^qi) 


J  dMM^l) 


1  xp  I \Pj  -  qj\\2 

J  ,  (rt-i  n-i) 


which  establishes  the  lower  bound  in  (I2TT).  The  upper  bound  is  trivial  since  we  always  have  that 


dM*(p,q)  >  \\p-q\\- 


□ 


From  Theorem  13.31  we  see  that,  in  many  cases,  the  joint  manifold  estimates  of  the  geodesic 
distances  will  be  more  accurate  than  the  estimates  obtained  using  one  of  the  component  manifolds. 
For  instance,  if  for  particular  component  manifold  AT^  we  observe  that 


then  we  know  that  the  joint  manifold  leads  to  better  estimates.  Essentially,  we  can  expect  that  the 
joint  manifold  will  lead  to  estimates  that  are  better  than  the  average  case  across  the  component 
manifolds. 


We  now  consider  the  case  where  we  have  a  sufficiently  dense  sampling  of  the  manifolds  so  that 


the  pj  are  very  close  to  one,  and  examine  the  case  where  we  are  obtaining  noisy  samples.  We  will 
assume  that  the  noise  affecting  the  data  samples  is  i.i.d.,  and  demonstrate  that  any  distance  calcu¬ 
lation  performed  on  AT  serves  as  a  better  estimator  of  the  pairwise  (and  consequently,  geodesic) 
distances  between  two  points  labeled  by  p  and  q  than  that  performed  on  any  component  manifold 
between  their  corresponding  points  pj  and  q:j . 

Theorem  3.4.  Let  AT  be  a  joint  manifold  from  J  isometric  component  manifolds.  Let  p  = 
(pi,  p2,  •  •  • ,  Pj)  and  q  =  (qi,  q2, . . . ,  qj )  be  samples  of  Ad*  and  assume  that  \\pj  —  qj  ||  =  dfor  all  j. 
Assume  that  we  acquire  noisy  observations  s  =  p  +  n  and  r  —  q  +  n',  where  n  =  (n1;  n2, . . . ,  nj ) 
and  n'  =  (nj,  n'2l . . . ,  n'j )  are  independent  noise  vectors  with  the  same  variance  and  norm  bound 


a2  and  \\rij 


2  <  e,  j  =  1,  •  •  • ,  J. 


Then, 
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Proof.  We  write  the  distance  between  the  noisy  samples  as 

j 

IIs  -  rll2  =  ^2i\\Pj  ~  ^'ll2  +  2(Pj  -  (lPnj  ~  n'j )  +  II nJ  ~  n'j\\2}-  (22) 

3= 1 


This  can  be  rewritten  as 


ib  -  <?n2 = y\{2(pj  -  Qj,  n3  -  n'j) + 


j=i 


\rin 


We  obtain  the  following  statistics  for  the  term  inside  the  sum: 

E[(Pj  -  qj ,  rij  -  n'j)  +  || nj  -  n'  ||2]  =  2cr2, 

| (pj  —  qj,  Uj  —  n'j)  +  || nj  —  n'-||2|  <  2 d^Jl  +  e. 

Using  Hoeffding’s  inequality,  we  obtain 


P 


j 

^{2{pj  -  qj,  nj  -  n'j )  +  \\Uj  -  n'  ||2}  -  2  Jo2 

3= 1 


2JX^ 

<  2e_  (2d^+^)2 . 


(23) 


2  j2  ^2 

2  Jcr2 1  >  JA)  <  2e  (2dVt+o'2 ; 

2  j2  ^2 

2J<r2|  <  JA)  >  l-2e”(^UP. 

Simplifying,  we  get 

A  ||s  —  r||2  A 

d2  +  2cr2  ~  ||p  —  Q'H 2  +  2 Jcr2  ~  d2  +  2<r2 

Replace  S  =  d2^2a2  to  obtain  the  result.  □ 

We  observe  that  the  estimate  of  the  true  distance  suffers  from  a  constant  small  bias;  this  can 
be  handled  using  a  simple  debiasing  stepH  Theorem  13.41  indicates  that  the  probability  of  large 
deviations  in  the  estimated  distance  decreases  exponentially  in  the  number  of  component  manifolds 
J;  thus  the  “denoising”  effect  in  joint  manifold  learning  is  manifested  even  in  the  case  where  only 
a  small  number  of  component  manifolds  are  present. 

As  an  example,  we  consider  three  different  manifolds  formed  by  images  of  an  ellipse  with 
major  axis  a  and  minor  axis  b  translating  in  a  2-D  plane;  an  example  point  is  shown  in  Figure  [3] 
The  eccentricity  of  the  ellipse  directly  affects  the  condition  number  1/r  of  the  image  articulation 
manifold;  in  fact,  it  can  be  shown  that  articulation  manifolds  formed  by  more  eccentric  ellipses 
exhibit  higher  values  for  the  condition  number.  Consequently,  we  expect  that  it  is  “harder”  to  learn 
such  manifolds. 

2Manifold  learning  algorithms  such  as  Isomap  deal  with  biased  estimates  of  distances  by  “centering”  the  matrix  of 
squared  distances,  i.e.,  removing  the  mean  of  each  row/column  from  every  element. 


2  \ ^ 

>  1  -  2e~  (2 , 


P  1- 


This  result  is  rewritten  to  obtain 

P  (| ||s  —  r||2  —  ||p  —  g||2  — 
P(|||s-r||2-  ||p  -  g||2  - 
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(iii)  ( a ,  b )  =  (7, 5) 


(i)  (a,  b)  =  (7,7)  (ii)  (a,  b)  —  (7,  6) 

Figure  3:  Three  articulation  manifolds  embedded  in  M4096  sharing  a  common  2-D  parameter  space  0. 


Figure  0  shows  that  this  is  indeed  the  case.  We  add  a  small  amount  of  white  gaussian  noise 
to  each  image  and  apply  the  Isomap  algorithm  [14]  to  both  the  individual  datasets  as  well  as  the 
concatenated  dataset.  We  observe  that  the  2-D  embedding  is  poorly  leamt  in  each  of  the  individual 
manifolds,  but  improves  visibly  when  the  data  ensemble  is  modeled  using  a  joint  manifold. 


4  Joint  manifolds  for  efficient  dimensionality  reduction 

We  have  shown  that  joint  manifold  models  for  data  ensembles  can  significantly  improve  the  perfor¬ 
mance  on  a  variety  of  signal  processing  tasks,  where  performance  is  quantified  using  metrics  like 
probability  of  error  for  detection  and  accuracy  for  parameter  estimation  and  manifold  learning.  In 
particular,  we  have  observed  that  performance  tends  to  improve  exponentially  fast  as  we  increase 
the  number  of  component  manifolds  J.  However,  we  have  ignored  that  when  J  and  the  ambient 
dimension  of  the  manifolds  N  become  large,  the  dimensionality  of  the  joint  manifold  —  JN  — 
may  be  so  large  that  it  becomes  impossible  to  perform  any  meaningful  computations.  Fortunately, 
we  can  transform  the  data  into  a  more  amenable  form  via  the  method  of  random  projections',  it  has 
been  shown  that  the  essential  structure  of  a  A'-dimcnsional  manifold  with  condition  number  1/r 
residing  in  1N  is  approximately  preserved  under  an  orthogonal  projection  into  a  random  subspace 
of  dimension  0(K  log (iV/r))  -C  N  [18].  This  result  can  be  leveraged  to  enable  efficient  design  of 
inference  applications,  such  as  classification  using  multiscale  navigation  [19],  intrinsic  dimension 
estimation,  and  manifold  learning  [20]. 

We  can  apply  this  result  individually  for  each  sensor  acquiring  manifold-modeled  data.  Sup¬ 
pose  ^-dimensional  data  from  J  component  manifolds  is  available.  If  N  is  large,  then  the  above 
result  would  suggest  that  we  project  each  manifold  into  a  lower-dimensional  subspace.  By  collect¬ 
ing  this  data  at  a  central  location,  we  would  obtain  J  vectors,  each  of  dimension  (){K  log  N),  so 
that  we  would  have  0(JK  log  N)  total  measurements.  This  approach,  however,  essentially  ignores 
the  joint  manifold  structure  present  in  the  data.  If  we  instead  view  the  data  as  arising  from  a  K- 
dimensional  joint  manifold  residing  in  RJN  with  bounded  condition  number  as  given  by  Theorem 
12.21  we  can  then  project  the  joint  data  into  a  subspace  which  is  only  logarithmic  in  J  as  well  as  the 
largest  condition  number  among  the  components,  and  still  approximately  preserve  the  manifold 
structure.  This  is  formalized  in  the  following  theorem. 
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Figure  4:  Results  of  Isomap  applied  to  the  translating  ellipse  image  data  sets. 


Theorem  4.1.  Let  A4*  be  a  compact,  smooth,  Riemannian  joint  manifold  in  a  J N -dimensioned 
space  with  condition  number  1  / t*.  Let  $  denote  an  orthogonal  linear  mapping  from  A4*  into 
a  random  M -dimensional  subspace  ofM.JN.  Let  M  >  0(K  log(J7V/r*)/e2).  Then,  with  high 
probability,  the  geodesic  and  Euclidean  distances  between  any  pair  of  points  on  Ad*  are  preserved 
up  to  distortion  e  under  the  linear  transformation  $. 

Thus,  we  obtain  a  faithful  approximation  of  our  manifold-modeled  data  that  is  only  0(K  log  JN) 
dimensional.  This  represents  a  significant  improvement  over  performing  separate  dimensionality 
reduction  on  each  component  manifold. 

Importantly,  the  linear  nature  of  the  random  projection  step  can  be  utilized  to  perform  dimen¬ 
sionality  reduction  in  a  distributed  manner,  which  is  particularly  useful  in  applications  when  data 
transmission  is  expensive.  As  an  example,  consider  a  network  of  J  sensors  observing  an  event  that 
is  governed  by  a  /\ -dimensional  parameter.  Each  sensor  records  a  signal  x3  £  M.N ,  1  <  j  <  ./; 
the  concatenation  of  the  signals  X  —  [»Z/ 1  X<2  ...  X  J]T  lies  on  a  iT-dimensional  joint  manifold 
M.*  C  Since  the  required  random  projections  are  linear,  we  can  take  local  random  projec¬ 
tions  of  the  observed  signals  at  each  sensor,  and  still  calculate  the  global  measurements  of  Ad* 
in  a  distributed  fashion.  Let  each  sensor  obtain  its  measurements  y3  =  (I>3  x3 ,  with  the  matrices 
<f)j  £  WMxN ,  1  <  j  <  J.  Then,  by  defining  the  M  x  JN  matrix  $  =  [<f>i . . .  <f>j],  our  global 
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projections  y*  =  $*x*  can  be  obtained  by 


y  =  <P  x 

=  xl  ...  xTj]T 

=  [$1  $2  ...  §j][x[  xl 

=  $1X1  +  $2^2  +  •  •  •  +  ®jXj. 


Thus,  the  final  measurement  vector  can  be  obtained  by  simply  adding  independent  random  pro¬ 
jections  of  the  signals  acquired  by  the  individual  sensors.  This  method  enables  a  novel  scheme 
for  compressive,  multi-modal  data  fusion-,  in  addition,  the  number  of  random  projections  required 
by  this  scheme  is  only  logarithmic  in  the  number  of  sensors  J .  Thus,  the  joint  manifold  frame¬ 
work  naturally  lends  itself  to  a  network-scalable  data  aggregation  technique  for  communication- 
constrained  applications. 


5  Discussion 

Joint  manifolds  naturally  capture  the  structure  present  in  a  variety  of  signal  ensembles  that  arise 
from  multiple  observations  of  a  single  event  controlled  by  a  small  set  of  global  parameters.  We 
have  examined  the  properties  of  joint  manifolds  that  are  relevant  to  real-world  applications,  and 
provided  some  basic  examples  that  illustrate  how  they  improve  performance  and  help  reduce  com¬ 
plexity. 

We  have  also  introduced  a  simple  framework  for  dimensionality  reduction  for  joint  manifolds 
that  employs  independent  random  projections  from  each  signal,  which  are  then  added  together 
to  obtain  an  accurate  low-dimensional  representation  of  the  data  ensemble.  This  distributed  di¬ 
mensionality  reduction  technique  resembles  the  acquisition  framework  proposed  in  compressive 
sensing  (CS)  [21,22];  in  fact,  prototypes  of  inexpensive  sensing  hardware  [23,24]  that  can  directly 
acquire  random  projections  of  the  sensed  signals  have  already  been  built.  Our  fusion  scheme  can 
be  directly  applied  to  the  data  acquired  by  such  sensors.  Joint  manifold  fusion  via  random  pro¬ 
jections,  like  CS,  is  universal  in  the  sense  that  the  measurement  process  is  not  dependent  on  the 
specific  structure  of  the  manifold.  Thus,  our  sensing  techniques  need  not  be  replaced  for  these 
extensions;  only  our  underlying  models  (hypotheses)  are  updated. 

The  richness  of  manifold  models  allows  for  the  joint  manifold  approach  to  be  successfully  ap¬ 
plied  in  a  larger  class  of  problems  than  principal  component  analysis  and  other  linear  model-based 
signal  processing  techniques.  In  fact,  joint  manifolds  can  be  immediately  applied  in  signal  pro¬ 
cessing  tasks  where  manifold  models  are  common,  such  as  detection,  classification,  and  parameter 
estimation.  When  these  tasks  are  performed  in  a  sensor  network  or  array,  and  random  projections 
of  the  captured  signals  can  be  obtained,  joint  manifold  techniques  provide  improved  performance 
by  leveraging  the  information  from  all  sensors  simultaneously. 
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